Contour completion for augmenting surface reconstructions

ABSTRACT

Surface reconstruction contour completion embodiments are described which provide dense reconstruction of a scene from images captured from one or more viewpoints. Both a room layout and the full extent of partially occluded objects in a room can be inferred using a Contour Completion Random Field model to augment a reconstruction volume. The augmented reconstruction volume can then be used by any surface reconstruction pipeline to show previously occluded objects and surfaces.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims priority to, U.S.patent application Ser. No. 14/179,642, filed on Feb. 13, 2014, andentitled “CONTOUR COMPLETION FOR AUGMENTING SURFACE RECONSTRUCTIONS”.

BACKGROUND

The availability of commodity depth cameras and sensors such asMicrosoft Corporation's Kinect® have enabled the development of methodswhich can densely reconstruct arbitrary scenes from captured depthimages. The task of generating dense 3D reconstructions of scenes fromdepth images has seen great progress in the last few years. While partof this progress is due to algorithmic improvements, large strides havebeen made with the adoption of inexpensive depth cameras and the fusionof color and depth signals.

The combined use of depth and color images has been successfullydemonstrated for the production of large-scale models of indoor scenesvia both offline and online algorithms. Most red, green, blue and depth(RGB+D) reconstruction methods require data that show the scene from amultitude of viewpoints to provide a substantially accurate and completesurface reconstruction.

Accurate and complete surface reconstruction is of special importance inAugmented Reality (AR) applications which are increasingly being usedfor both entertainment and commercial purposes. For example, a recentlyintroduced gaming platform asks users to scan an interior scene frommultiple angles to construct a model of the scene. Using the denselyreconstructed model, the platform overlays graphically generatedcharacters and gaming elements. In another example, furniture retailerscan enable customers to visualize how their furniture will look wheninstalled without having to leave their homes. These AR applicationsoften require a high fidelity dense reconstruction so that simulatedphysical phenomenon, such as lighting, shadow and object interactionscan be produced in a plausible fashion.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

In general, surface reconstruction contour completion techniqueembodiments described herein infer the layout of a completereconstructed scene and the full extent of partially occluded objects inthe reconstructed scene.

In one embodiment of the surface reconstruction contour completiontechnique, a partially reconstructed three dimensional (3D) scene iscompleted. A dense partial reconstruction volume of a three dimensionalscene is received (for example, from a surface reconstruction pipeline)and surfaces are detected in the partial reconstruction volume. Thesurfaces are then classified into different types. The classifiedsurfaces are used to infer the scene boundaries and the boundaries ofpartially occluded surfaces of objects in the partial reconstructionvolume. The partial reconstruction volume is then updated to show theinferred boundaries of partially occluded surfaces of objects in thescene and the scene boundaries.

DESCRIPTION OF THE DRAWINGS

The specific features, aspects, and advantages of the disclosure willbecome better understood with regard to the following description,appended claims, and accompanying drawings where:

FIG. 1 depicts a flow diagram of an exemplary process for practicing oneembodiment of the surface reconstruction contour completion techniquedescribed herein.

FIG. 2 depicts a flow diagram of another exemplary process forpracticing another embodiment of the surface reconstruction contourcompletion technique described herein.

FIG. 3 depicts a system for implementing one exemplary embodiment of thesurface reconstruction contour completion technique described herein.

FIG. 4 depicts a diagram of using one embodiment of the surfacereconstruction contour completion technique to augment a reconstructionvolume in a surface reconstruction pipeline.

FIG. 5 is a schematic of an exemplary computing environment which can beused to practice various embodiments of the surface reconstructioncontour completion technique.

DETAILED DESCRIPTION

In the following description of surface reconstruction contourcompletion technique embodiments, reference is made to the accompanyingdrawings, which form a part thereof, and which show by way ofillustration examples by which the surface reconstruction contourcompletion technique embodiments described herein may be practiced. Itis to be understood that other embodiments may be utilized andstructural changes may be made without departing from the scope of theclaimed subject matter.

1.0 Surface Reconstruction Contour Completion

The following sections provide an introduction to the surfacereconstruction contour completion technique embodiments describedherein, as well as exemplary implementations of processes and anarchitecture for practicing these embodiments. Details of variousembodiments and components are also provided.

As a preliminary matter, some of the figures that follow describeconcepts in the context of one or more structural components, variouslyreferred to as functionality, modules, features, elements, etc. Thevarious components shown in the figures can be implemented in anymanner. In one case, the illustrated separation of various components inthe figures into distinct units may reflect the use of correspondingdistinct components in an actual implementation. Alternatively, or inaddition, any single component illustrated in the figures may beimplemented by plural actual components. Alternatively, or in addition,the depiction of any two or more separate components in the figures mayreflect different functions performed by a single actual component.

Other figures describe the concepts in flowchart form. In this form,certain operations are described as constituting distinct blocksperformed in a certain order. Such implementations are illustrative andnon-limiting. Certain blocks described herein can be grouped togetherand performed in a single operation, certain blocks can be broken apartinto plural component blocks, and certain blocks can be performed in anorder that differs from that which is illustrated herein (including aparallel manner of performing the blocks). The blocks shown in theflowcharts can be implemented in any manner.

1.1 Introduction

In general, the surface reconstruction contour completion techniqueembodiments described herein perform surface reconstruction from one ormore limited viewpoints, and “fill in” parts of a scene that areoccluded or not visible to the camera or cameras that captured thescene. Basic scene understanding requires high level knowledge of howobjects interact and extend in 3D space. While most scene understandingresearch is concerned with semantics and pixel labeling, relativelylittle work has gone into inferring object or surface extent, despitethe prevalence and elemental nature of this faculty in humans. Onlinesurface reconstruction pipelines, such as, for example, MicrosoftCorporation's Kinect® Fusion, are highly suitable for augmented reality(AR) applications, but could benefit from a scene augmentation phase,integrated into the pipeline. Embodiments of the surface reconstructioncontour completion technique provide such scene augmentation.

Embodiments of the surface reconstruction contour completion techniqueprovide for many advantages. Holes and gaps in a reconstruction of ascene are filled in by completing the contours of occluded surfaces inthe scene. This makes it unnecessary to capture a scene from multipleviewpoints in order to show the true extent and shape of occludedobjects and scene surfaces. Furthermore, augmenting a reconstruction ofa scene requires very little effort on the part of a user and occurs inreal time. The augmented surface reconstructions created can also beused with many types of surface reconstruction pipelines.

Some surface reconstruction contour completion technique embodimentsemploy a Contour Completion Random Field (CCRF) model. This modelcompletes or extends contours in 2D scenes. It also infers the extent ofa scene and its objects. Furthermore, it provides the efficientre-integration of the inferred scene elements into a 3D scenerepresentation to allow them to be used by a downstream application suchas an AR application.

In one embodiment, a partial dense reconstruction of a scene is receivedthat is represented as a point cloud in a voxel grid where each voxel'sstate can be occupied by a surface, free (of a surface) or its state isunknown. It should be noted that this representation is a very generalrepresentation of a surface and can be generated by sampling from manyother types of representations of a scene (for example meshes) and theposition of the cameras that captured the scene. In one embodiment thetechnique uses a real-time dense surface mapping and tracking procedureto compute this reconstruction volume which also assigns a surfacenormal and truncated signed distance function (TSDF) to the voxels.Microsoft Corporation's Kinect® Fusion, for example, is such a real-timedense surface mapping and tracking process that can be used. Itmaintains an active volume in Graphics Processing Unit (GPU) memory,updated regularly with new depth observations from a depth camera orsensor, such as for example, Microsoft Corporation's Kinect® sensor.Each depth frame is tracked using an iterative closest point procedure,and each depth frame updates the reconstruction volume which isrepresented as a truncated signed-distance function (TSDF) grid. At anypoint the TSDF volume may be rendered (for example, by using raycasting) or transformed into an explicit surface (for example, by usinga marching cubes method or similar method). It should be noted thatwhile one embodiment of surface contour completion employs a partialscene reconstruction volume created by Microsoft Corporation's Kinect®Fusion and Kinect® camera/sensor, any other type of scene reconstructionmethod and depth camera can be employed with the technique.

Given the reconstruction volume, in one embodiment surfaces are detectedfirst. For example, planar surfaces in the scene are detected and eachone is classified as being part of the scene layout (floor, wallsceiling) or part of an internal object in the scene. The identities ofthe planes are then used to extend them by solving a labeling problemusing a CCRF model described herein. Unlike pairwise Markov RandomFields, which essentially encourage short boundaries, the CCRF modelencourages discontinuities in the labeling to follow detected contourprimitives such as lines or curves. The surface reconstruction contourcompletion technique embodiments described herein use this model tocomplete both the floor map for the scene and to estimate the extents ofplanar objects in the room. Finally the original input volume isaugmented to portray the extended and filled scene. It should be notedthat while this description primarily discusses planar surfaces, thesurface reconstruction contour completion technique embodimentsdescribed herein can be applied to many other types of objects as well.For example, a cylindric object can be detected and the cylindricsurface will be mapped to a plan, where the completion is done in asimilar fashion as described herein. The same applies to many othertypes of surfaces, such as, for example, cones and spheres.

1.2 Exemplary Processes

As discussed above, Kinect® Fusion is a real-time dense surface mappingand tracking pipeline. It maintains an active volume in GPU memory,updated regularly with new depth observations from a Kinect®camera/sensor. Each depth frame is tracked and updates thereconstruction volume which is represented as a truncatedsigned-distance function (TSDF) volume/grid. At any point the TSDFvolume may be rendered (for example, by using ray casting) ortransformed into an explicit surface using a marching cubes algorithm(or similar algorithm). The surface reconstruction contour completiontechnique embodiments described herein extend this pipeline by accessingthe TSDF volume at certain key frames (e.g., events where substantialnew data is added), and augmenting it with inferred boundaries ofoccluded surfaces. As new depth observations reveal previously occludedvoxels, the inferred augmentations are phased out. The surfacereconstruction contour completion technique embodiments described hereinmay be used to augment other surface reconstruction methods with littlechange.

FIG. 1 depicts an exemplary process 100 for completing a dense partial3D reconstruction of a scene. In this embodiment, surfaces (e.g., planarsurfaces) in a dense partial 3D reconstruction volume of a scene aredetected, as shown in block 102. The detected surfaces are thenclassified into different types of surfaces, as shown in block 104. Forexample, the detected surfaces can be classified into walls, a ceiling,a floor and internal surfaces.

The dense partial 3D reconstruction volume of the scene is thenaugmented to show scene boundaries and greater portions of partiallyoccluded surfaces of objects in the scene using the classified surfacesand a contour completion procedure that completes the contours ofoccluded surfaces in the scene, as shown in block 106.

FIG. 2 depicts another exemplary process 200. In this embodiment 200 apartially reconstructed three dimensional (3D) scene is completed.

As shown in block 202, a dense 3D partial reconstruction volume of athree dimensional scene is received (for example, from a surfacereconstruction pipeline). The partial 3D reconstruction volume can bereceived from a surface reconstruction pipeline that maintains thepartial reconstruction volume of the scene that is updated with newdepth observations of the scene. The dense reconstruction volume may beexpressed as a volume of voxels (overlaid on a 3D point cloud). Everyvoxel can be labeled as having an observed surface at that voxel (e.g.,occupied), free state where there is no observed surface at the voxel,or unknown (it is unknown whether or not there is a surface at thatvoxel).

Surfaces are detected in the partial reconstruction volume, as shown inblock 204. Surfaces can be detected using a Hough transform and pointscan be associated to each of these surfaces. These surfaces can be anytype of surface as long as they can be mapped to a plane. The details ofdetecting surfaces employed in some embodiments of the technique will bediscussed in greater detail later.

The detected surfaces are classified into different types of surfaces,as shown in block 206. This can be done, for example, by using semanticlabeling and a classifier that classifies the surfaces into sceneboundaries such as wall, floor, ceiling and internal surfaces. Thesurfaces can be classified into semantic classes using a trainedclassifier which predicts each surfaces class using ground truth labelsand 3D features. The classifier captures attributes of each surfaceincluding its height, size and surface normal distribution. A RandomForest classifier can be used to classify the surfaces, but many othertypes of classifiers can also be used, like, for example, Support VectorMachines (SVMs), boosted linear classifiers and so forth. The details ofclassifying surfaces employed in some embodiments of the technique arediscussed in greater detail later.

As shown in block 208, the classified surfaces are used to define thescene boundaries that are in the partial reconstruction volume of thescene. This can be done, for example, by using the contour completionrandom field (CCRF) model previously discussed. The CCRF model completesthe contours of occluded surfaces in the scene by minimizing an energyfunction that determines whether a point on a contour lies on a surfaceor not and labels the point as being on the contour of the surface ornot. Two neighboring points that are assigned different labels as tobelonging to a contour of a surface or not are assigned a penalty whendetermining whether the points belong to the contour of the surface. Thecontours of occluded surfaces in the scene can be completed using linesand parabolas fitted to visible parts of the contours of the surfaces.An exemplary process for using the CCRF model to complete contours isdescribed in greater detail later.

The boundaries of partially occluded surfaces of objects in the partialreconstruction volume are also inferred, as shown in block 210. Again,this can be done, for example, by using the CCRF model previouslydiscussed. The contours of occluded objects in the scene can becompleted using lines and parabolas fitted to visible parts of thecontours of the surfaces of the occluded objects.

The partial 3D reconstruction volume is then augmented to show thedetermined boundaries of partially occluded surfaces of objects in thescene and the scene boundaries, as shown in block 212. An exemplaryprocess for augmenting the partial 3D reconstruction volume is describedin greater detail later.

Exemplary processes for practicing the technique having been provided,the following section discussed an exemplary system for practicing thetechnique.

1.3 An Exemplary System

FIG. 3 provides an exemplary system 300 for practicing embodimentsdescribed herein. A surface reconstruction contour completion module 302resides on a computing device 500 such as is described in greater detailwith respect to FIG. 5.

As shown FIG. 3, a surface detection module 302 of the surfacereconstruction contour completion module 304 detects surfaces in aninput partial 3D reconstruction volume 306. The partial reconstructionvolume 306 can be received from a surface reconstruction pipeline 308that maintains the partial reconstruction volume of the scene andupdates it with new depth observations of the scene.

It should be noted that the partial 3D reconstruction volume 306 can bereceived from the surface reconstruction pipeline 308 over a network 310(e.g., if the surface reconstruction pipeline resides on a server or acomputing cloud). Alternately, the surface reconstruction pipeline canreside on the same computing device 500 as the surface reconstructioncontour completion module 302. (It should be noted that the wholepipeline may lie on a computing cloud. Input in a form of depth videosmay be included from one or more clients, fused and a final completioncan be generated. The final model (or current depth view) may then besent back to the clients.)

The partial 3D reconstruction volume 306 may be expressed as a volume ofvoxels (overlaid on a 3D point cloud). Every voxel can be labeled ashaving an observed surface at that voxel (labeled as occupied), freestate where there is no observed surface at the voxel, or unknown (it isunknown whether or not there is a surface at that voxel).

A surface classification module 312 classifies the detected surfaces inthe partial 3D reconstruction volume 306. Surfaces can be detected usinga Hough transform and points can be associated to each of thesesurfaces. The surface classification module 312 classifies the detectedsurfaces into different types of surfaces, for example, by usingsemantic labeling and a classifier that classifies the surfaces intoscene boundaries such as wall, floor, ceiling and internal surfaces. Theclassified surfaces are used to define the scene boundaries in thepartial reconstruction volume of the scene. This can be done, forexample, by using a contour completion module 314 and a CCRF model 316previously discussed. The contours of occluded surfaces in the scene canbe completed using lines and parabolas fitted to visible parts of thecontours of the surfaces.

The boundaries of partially occluded surfaces of objects in the partialreconstruction volume 306 are also determined using the contourcompletion module 314. Again, this can be done, for example, by usingthe CCRF model 314 previously discussed. The contours of occludedobjects in the scene can also be completed using lines and parabolasfitted to visible parts of the contours of the surfaces of the occludedobjects.

The completed contours of the detected and classified surfaces in thepartial 3D reconstruction volume 306 are used by an augmentation module318 to create an augmented 3D reconstruction volume 320 that depictsgreater portions of partially occluded surfaces of objects in the sceneand the scene boundaries. This augmented 3D reconstruction volume 320 isthen fed back in to the surface reconstruction pipeline 308. Theaugmented 3D reconstruction volume 320 can be updated in real-time.

1.4 Details and Exemplary Computations

A description of exemplary processes and an exemplary system forpracticing the surface reconstruction contour completion embodimentshaving been provided, the following sections provide a description of anexemplary surface reconstruction pipeline that can be used along withdetails and exemplary computations for various surface reconstructioncontour completion technique embodiments.

1.4.1 Exemplary Surface Reconstruction Pipeline

As discussed previously, the surface reconstruction contour completiontechnique embodiments can be used with various surface reconstructionpipelines. One such surface reconstruction pipeline, which is used withone exemplary embodiment of the technique, is Microsoft Corporation'sKinect® Fusion pipeline. A brief description of this pipeline isdescribed in the paragraphs below in order to provide some backgroundinformation with respect to an exemplary surface reconstruction pipelinethat can be used with the technique.

The Kinect® Fusion system reconstructs a single dense surface model withsmooth surfaces by integrating the depth data from a depth camera orsensor, such as, for example, Microsoft Corporation's Kinect®, over timefrom multiple viewpoints. The camera pose is tracked as thecamera/sensor is moved (its location and orientation) and because eachframe's pose and how it relates to the others is known, these multipleviewpoints of objects or the environment can be fused (averaged)together into a single reconstruction voxel volume. The voxel volume canbe thought of as a large virtual cube in space (the reconstructionvolume), located around the scene in the real world, and depth data(i.e. measurements of where the surfaces are) integrated into this asthe sensor is moved around.

The Kinect® Fusion processing pipeline involves several stages to gofrom the raw depth data to a 3D reconstruction. The first stage is adepth map conversion that takes the raw depth data from the Kinect®camera/sensor and converts it into floating point depth data in meters,followed by an optional conversion to an oriented point cloud whichconsists of 3D points/vertices in the camera coordinate system, and thesurface normals (orientation of the surface) at these points.

The second stage calculates the global/world camera pose (its locationand orientation) and tracks this pose as the Kinect® sensor moves ineach frame using an iterative alignment algorithm, so that the Kinect®Fusion system always knows the current sensor/camera pose relative tothe initial starting frame. There are two algorithms in Kinect® Fusion.The first can either be used to align point clouds calculated from thereconstruction with new incoming point clouds from the Kinect® cameradepth, or standalone (for example, to align two separate cameras viewingthe same scene). The second algorithm provides more accurate cameratracking results when working with a reconstruction volume, however,this may be less robust to objects which move in a scene.

The third stage is fusing (or “integration”) of the depth data from theknown sensor pose into a single volumetric representation of the spacearound the camera. This integration of the depth data is performedper-frame, continuously, with a running average to reduce noise, yethandles some dynamic changes in the scene.

The reconstruction volume can be raycast from a sensor pose (which istypically, but not limited to, the current Kinect® sensor pose), andthis resultant point cloud can be shaded for a rendered visible image ofthe 3D reconstruction volume.

1.4.2 Input

The surface reconstruction contour completion technique embodiments canbe used with various input representations.

In one exemplary embodiment, the 3D reconstruction volume is representedas a cloud of 3D sensed points such as that described above with respectto the Kinect® Fusion pipeline. These points lie on surfaces in theenvironment. Note that the input may have different representations,such as an incomplete mesh of the environment, or as implicitrepresentations of the surfaces (such as a 3D function that is stored atregular points in space and it is positive on one side of the surfaceand negative on the other). It is possible to extend the followingdescription to different representations, but all those representationscan be approximated as a cloud of 3D points.

In one embodiment a voxel grid (over a 3D point cloud) and a set ofsurface normals is received. Each voxel in the grid represents a surfaceobservation (occupied), free space or unknown and each occupied voxelhas already been assigned a surface normal. While one embodiment of thetechnique uses a truncated signed distance function (TSDF) and surfacenormals produced by Kinect® Fusion, any voxel grid that encodes thesethree states and the corresponding surface normals would suffice.

As will be described in greater detail in the following paragraphs,given this input, the technique embodiments first detect surfaces, forexample, planar surfaces, and classify each one as being part of thescene layout (floor, walls ceiling) or part of an internal object in thescene. Using the identities of each plane, two stages of scene extensionare performed. First, a floor map is completed providing an enclosed 3Dmodel of the room. Second, the extents of surfaces of objects in theroom are determined, for example planar surfaces of objects in the room.Finally the original input volume is augmented to account for theextended and filled scene. The stages of an exemplary process aredemonstrated in FIG. 4. As shown in FIG. 4, block 402, a received 3Dpartial dense reconstruction volume is analyzed to find the majorplanes. As shown in block 404, the major planes are classified intoceiling, walls, floor and interior planes. The scene boundaries and theshape of any partially occluded interior objects are inferred, as shownin block 406. The inferred scene boundaries and interior objected areused to produce the complete reconstructed scene, as shown in block 408.Details of exemplary computations for each of the actions in blocks 402through 408 are described in detail in the paragraphs below.

1.4.3 Plane Detection and Classification As shown in FIG. 4, block 402,the received partial 3D reconstruction volume is analyzed to find themajor planar surfaces. The following paragraphs describe how thetechnique detects the dominant planes from the partial voxel basedreconstruction. The space of all possible 3D planes is denoted by

, and the set of planes present in the scene is denoted by H. Let theset of all 3D points visible in the scene be denoted by P={p₁, . . .P_(N)}. The technique estimates the most probable labeling for H byminimizing the following energy function:

$\begin{matrix}{H*={{{\sum\limits_{i = 1}^{N}{f_{i}(H)}}} + {\lambda {{H}.}}}} & (1)\end{matrix}$

where λ is a penalty on the number of planes and f_(i) is a functionthat penalizes the number of points not explained by any plane:f_(i)(H)=min{min_(hεH) [δ(p_(i),h),λ_(b)]}, where the function 8 returnsa value of 0 if point p_(i) falls on plane h and is infinity otherwise.Minimizing the first term alone has the trivial degenerate solutionwhere a plane is included for every point p_(i) in the set H. However,this situation is avoided by the second terms of the energy which actsas a regularizer and adds a penalty that linearly increases with thecardinality of H.

1.4.3.1 Detecting a Set of Surfaces

In one exemplary embodiment of the surface reconstruction contourcompletion technique a greedy strategy is employed, starting from anempty set and repeatedly adding the element that leads to the greatestenergy reduction. This method has been observed to achieve a goodapproximation. It begins by using a Hough transform to select a finiteset of surfaces (e.g., planes). In one embodiment, each 3D point and itssurface normal votes for a plane equation parameterized by its azimuthθ, elevation ψ and distance from the origin ρ. Each of these votes isaccrued in an accumulator matrix of size A×E×D where A is the number ofazimuth bins, E is the number of elevation bins and D is the number ofdistance bins (in one embodiment A=128, E=64 was used and D was founddynamically by spacing bin edges of size 5 cm apart by the maximum andminimum points). After each point has voted, a non-maximal suppressionprocedure is run to avoid accepting multiple planes that are toosimilar.

Once a set of candidate surfaces (e.g., planes) has been determined thetechnique sorts them in descending order by the number of votes theyhave received and iteratively associates points to each plane. A pointcan be associated to a plane if it has not been previously associated toany other plane and if its planar disparity and local surface normaldifference are small enough (in one embodiment a planar disparitythreshold of 0.1, and an angular disparity threshold of 0.1 was used).As an additional heuristic, each new plane and its associated points arebroken into a set of connected components ensuring that planes arelocally connected.

1.4.3.2 Classification of Surfaces/Planes with Semantic Labeling

As shown in FIG. 4, block 404, once a set of planes has been determined,each one is classified independently into one of four semantic classes:floor, wall, ceiling and internal. To do so, in one embodiment a RandomForest Classifier is trained to predict each plane's class using theground truth labels and 3D features using a procedure which capturesattributes of each plane including its height in the room, size andsurface normal distribution. Planes classified as one of floor, wall andceiling are used for inferring the floor plan and scene boundaries,whereas internal planes are extended and filled in a subsequent step.

One embodiment uses a procedure for fast parabola fitting, describedlater in greater detail. Starting with an input image, the contactpoints and contour edges are found. Next, the contact points are alignedand a parabola is fitted using several radii to find the best fit.

1.4.4 Scene Completion

Given the set of detected and classified planes the surfacereconstruction contour completion technique embodiments described hereininfer the true extent of the scene (e.g., it obtains an enclosed roomstructure), and extend interior planes based on evidence from the sceneitself, as shown in FIG. 4, block 406.

1.4.4.1 Surface Boundary Completion as a Pixel Labeling Problem

The following paragraphs describe the mathematical computations used toestimate the boundaries of surfaces (e.g., planes) as seen from atop-down view in one exemplary embodiment. Boundary completion isformulated as a pixel labeling problem. Consider a set S of nodes thatrepresent grid locations in the top-down view of the scene. It isassumed that a partial labeling of nodes iεS in the grid can be observedand is encoded by variables y_(i); iεS where y_(i)=1, y_(i)=0 andy_(i)=−1 represent that i belongs to the plane, does not belong to theplane, and its membership is uncertain respectively. Given y, it isdesired to estimate the true extent of the plane which is denoted by x.Specifically, the binary variable x_(i) is used to encode whether theplane covers the location of node i in the top-view. x_(i)=1 representsthat node i belongs to the plane while x₁=0 represents that it does not.

The traditional approach for pixel labeling problems is to use apairwise Markov Random Field (MRF) model. The energy of any labeling yunder the pairwise MRF model is defined as:

E(x)=Σ_(iεS)φ_(i)(x ₁)Σ_(ijεN)φ_(ij)(x _(i) ,x _(j))K  (2)

where φ_(i) encode the cost of assigning a label x_(i) and φ_(ij) arepairwise potentials that encourage neighboring (N) nodes to take thesame label, and K is a constant. The unary potential functions force theestimated labels x to be consistent with the observations y, ie.φ_(i)(x_(i))=inf if y_(i)≠−1 and x_(i)≠y_(i), and φ_(i)(y_(i))=0 for allother cases, while the pairwise potentials take the form an Ising model.The Maximum a Posteriori (MAP) labeling under the model can be computedin polynomial time using graph cuts. However, the results areunderwhelming as the pairwise model does not encode any informationabout how boundaries should be completed. It simply encourages alabeling that has a small number of discontinuities.

1.4.4.2 Contour Completion Random Field

One embodiment of the technique employs a CCRF model as previouslydiscussed. The mathematical computations for applying the CCRF model tocomplete contours of occluded surfaces are now described.

It has been observed that although a large number of neighboring pixelpairs take different labels, the majority of these pixels have aconsistent appearance. Therefore potentials penalizing segmentationswith a large number of types of figure-ground transitions are defined.This in contrast to the standard MRF which penalizes the absolute numberof transitions. This CCRF model does not suffer from the short boundarybias as the energy does not grow with the number of transitions if theyare of the same type, i.e., have the same appearance.

Unlike methods where higher order potentials are defined over disjointedge groups that are clustered based on appearance, in CCRF the higherorder potential are defined over overlapping sets of edges where eachset follows some simple (low-dimensional) primitive curve shape such asa line or a circle. The energy function for the CCRF model can bewritten as:

E(x)=Σ_(iεS)φ_(i)(x _(i))Σ_(gεG)Ψ_(g)(x)  (3)

where Ψ_(g) are curve completion potentials, and G is a set where eachcurve g represents a set of nodes (edges) that follow a curve. The curvecompletion potentials have a diminishing returns property. Moreformally,

Ψ_(g)(x)=F(Σ_(ijεE) _(g) ψ_(ij)(x _(i) ,x _(j))),  (4)

where E_(g) is the set of edges that defines the curve or edge group g.F is a non-decreasing concave function. In experiments, F was defined asan upper-bounded linear function ie. F(t)=min{λ*t,θ} where λ is theslope of the function and θ is the upper-bound. It can be seen that oncea few edges are cut t≧θ/λ, the rest of the edges in the group can be cutwithout any penalty. This behavior of the model does not prevent theboundary in the labeling from including large number of edges as long asthey belong to the same group (curve).

1.4.4.2.1 Transforming Curve Completion Potentials to a Pairwise Model

It has been demonstrated that potentials of the form of equation (4) canbe transformed into a sum of pairwise potentials with the addition ofauxiliary variables. Concretely, the minimization over the curvecompletion potentials can be written as the minimization of a submodularpairwise pseudo-boolean function with the addition of one variable peredge and one variable per edge group. The minimization problem over thehigher-order curve completion potential (4) can be transformed to

$\begin{matrix}{{\Psi_{g}(x)} = {T + {\min\limits_{h_{g},z}{\left\{ {{\sum\limits_{{ij} \in E_{g}}{\theta_{ij}\left( {{\left( {x_{i} + x_{j} - {2z_{ij}}} \right)h_{g}} - {2\left( {x_{i} + x_{j}} \right)z_{ij}} + {4z_{ij}}} \right)}} - {Th}_{g}} \right\}.}}}} & (5)\end{matrix}$

where h_(g) is the binary auxiliary corresponding to the group g, andz_(ij), ∀ijεE_(g) are binary auxiliary variables corresponding to theedges that constitute the edge group g.

1.4.4.2.2 Constraining the Edge Groups

In addition to the higher order terms above, a series of constraints onthe set of edge groups is introduced. In particular, edge groups can beorganized hierarchically so that a set of possible edges have a parentand only a single child per parent may be active. These constraints areformalized using the binary auxiliary variables:

E(x)=E _(iεS)φ_(i)(x _(i))+Σ_(gεG)Ψ_(g)(x)

s.t.Σ _(kεc(g)) h _(k)≦1  (6)

where c(g) denotes the set of child edge groups for each parent g. Theaddition of these constraints does not alter the inference mechanism asthe technique follows a greedy inference method described to minimizethe CCRF energy (3) by solving a series of submodular pseudo-booleanfunction minimization problems using graph cuts. Because the strategy isgreedy, the technique simply does not allow a child edge group to beselected if its parent is not already active. The exact nature of thesegroups are described below.

1.4.5 Defining Edge Groups

As shown in block 406, the surface reconstruction contour completiontechnique embodiments described herein infer scene boundaries andinterior surfaces. In one embodiment, this is done by detecting linesand parabolas along the contour of each known pixels and inferring (orhalluncating) edge groups. The CCRF model is then used to infer theextent of the surfaces.

In one embodiment two types of edge groups are considered to define theedges of surfaces, namely straight lines and parabolas. While previouswork has demonstrated the ability of the Hough transform to detect othershapes, such as circles and ellipses, such high parameter shapes requiresubstantially more memory and computation. It was found that lines andparabolas are sufficiently flexible to capture most of the casesencountered and these are used by the surface reconstruction contourcompletion technique to complete the contours of surfaces.

1.4.5.1 Detecting Lines

To detect lines, in one embodiment a modified Hough transform is used tonot only detect lines, but also the direction of the transition (theplane to free space or vice-versa). In one embodiment an accumulator isused with 3 parameters: ρ, the distance from the origin to the line, θ,the angle between the vector from the origin to the line and the X axis,and a quaternary variable d, which indicates the direction of thetransition (both bottom-top and left-right directions) (in oneembodiment the technique uses 400 angular bins for 0 and evenly spacedbins ρ1 unit apart. The minimum number of votes allowed is set to 10).Following the accumulation of votes, non-maximal suppression is run andan edge group for each resulting line is created.

1.4.5.2 Detecting Parabolas

The standard Hough transform for parabolas requires four parameters. Toavoid the computational and memory demands of such a design, a novel andsimple heuristic is introduced. First, the technique identifies eachpoint in the input image which falls at the intersection of free space,occupied and unknown pixels. These intersections are referred to ascontact points. Furthermore, all pixels occupied by a plane andbordering free space are referred to as contour points.

To close an occluded contour, a parabola either connects a set ofcontact points or continues at a contact point until it reaches the endof the image. It is noted that only contact points that border the sameocclusion region can possibly be bridged. Therefore, a set of contactpoint pairs ξ is created from which parabolas will be estimated. Ifmultiple contact points all border the same occlusion region, eachcontact point is paired with its nearest neighbor. For each pair ofcontact points, a parabola is fit to all of the contour points thatimmediately border each contact point. Because the parabola may berotated, first these bordering contour points are rotated so that thenormal of the line joining the contact points is aligned with theY-axis. To avoid over or under fitting the contour, the contour pointsare sampled using several radii and the parabola is kept most consistentwith the observed contour. If a contact point cannot be paired, or if apair provides only poor fits, a parabola is fit using its immediatecontour point neighbors for each side separately. This fast parabolafitting process is summarized in Table 1.

TABLE 1 Fast Parabola Fitting Process Data: Ternery Image Y, set ofsearch radii T Results: List of Edge Groups G C = FindContactPoints(Y) ξ= FindPairsOfContactPoints(C, Y)   for ν_(i), ν_(j) ∈ ξ do    Initialize vote vector to all zeros     θ_(τ) = 0 ∀ τ ∈ |T| ;    for τ ∈ T do       P = FindNeighboringContourPoints(ν_(i), ν_(j),τ)      P′ = RotatePoints(P)        α = LeastSquaresFit(P′)        Q′ =sampleParabola(α)       Q = rotateBackToImageCoordinates(Q)       g =defineEdgeGroup(Q)       θ_(τ) = CountVotesFromEdgeGroup(g, Y)      end     Append(G, BestParabola(θ))    end

1.4.5.3 Hallucinating Edges

While using detected lines or curves may encourage the correct surfaceboundaries to be inferred in many cases, in others, there is no evidencepresent in the image of how a shape should be completed. Adding edgegroups whose use in completion would help provide for shapes thatexhibited simple closure and symmetry. More specifically, for eachobserved line detected, additional parallel edge groups are added on theoccluded side of the shape.

1.4.6 Inferring Scene Boundaries

As shown in FIG. 4, block 406, edges of the classified surfaces aredetermined and scene boundaries and interior surface boundaries areinferred. To summarize, the edge groups are obtained by fitting linesand parabolas to the input image thus encouraging transitions that areconsistent with these edges. As indicated in Equation 6, not all edgegroups can be active simultaneously and in particular, any line used tohallucinate a series of edges is considered the parent to its childhallucinated lines. Consequently, only a single hallucinated line isactive (at most) at a time.

To extend and fill the scene boundaries, the free space of the inputTSDF and the wall planes (predicted by the classifier) are projectedonto the floor plane. Given a 2D point cloud induced by theseprojections the points are discretized to form a projection image whereeach pixel y_(i) takes on the value of free space, wall or unknown. Toinfer the full scene layout, the CCRF (Equation 3) is applied to inferthe values of the unknown pixels. In this case, free space is consideredto be the area to be expanded (y_(i)=1) and the walls to be thesurrounding area to avoid being filled (y_(i)=0). The lines and curvesof the walls are first detected to create a series of edge groups. Next,φ_(i)(x_(i)=1)=∞ if y_(i)=0 and φ_(i)(x_(i)=0)=∞ if y_(i)=1. Finally, inone embodiment a slight bias is added to assigning free spaceφ_(i)(x_(i)=0)=ε where ε=1e−6.

1.4.7 Extending Planar Surfaces

As shown in FIG. 4, block 406, once the scene boundary has beencompleted, the full extent of internal planar surfaces is also inferred.For each internal plane, the TSDF is projected onto the detected 2Dplane as follows. First a coordinate basis is found for the plane usingPrincipal Component Analysis (PCA) and the major and minor axes of theplane are estimated, M and N, respectively. Next, an image of size2N+1×2M+1 is created where the center pixel of the image corresponds tothe centroid of the plane. A grid is sampled along the plane basis ofsize 2N+1×2M+1 where the TSDF values sampled in each grid location areused to assign each of the image's pixels. If the sampled TSDF value isoccupied, y_(i) is set to 1, if its free space y_(i) is set to 0 and ifits unknown, y_(i) is set to −1. In practice, several voxels away fromthe plane are also sampled (along the surface normal direction). Thisheuristic has the effect of reducing the effects of sensor noise anderror from plane fitting.

Once Y has been created, all lines and parabolas in the image aredetected and the necessary lines hallucinated to create the edge groups.Next, the local potentials are assigned in the same manner as describedpreviously.

1.5 Augmenting the Original Volume

As shown in FIG. 4, block 408, the inferred scene boundaries andinterior surfaces are used to augment the 3D reconstruction volume. Theresult of the scene completion is an enclosed scene boundary, andextended interior object surface planes. As the final step in thepipeline the original TSDF imported is augmented. In one embodiment, forthe scene boundary the resulting polyline representing the boundary issimplified, and points along this boundary from floor to ceiling heightare sampled. For the interior planes the technique samples points in theextended parts of the planes. For each sampled point (sampled densely asrequired, in this case γ) a bresenham-line is traversed in the volumefrom the voxel closest to the point, and in two directions, its normaland the inverse to its normal. For each encountered voxel, the TSDFvalue is updated with the distance to the surface. If the dimensions ofthe original volume do not suffice to hold the new scene boundaries, anew larger volume is created and the original TSDF is copied to it,before augmenting it.

The augmented TSDF, in the originating surface reconstruction pipeline,is continuously updated with new evidence (e.g. as the user moves).Augmented areas are phased out as the voxels which they filled becomeknown.

2.0 Exemplary Usage Scenarios and Alternate Embodiments:

The following paragraphs provide some exemplary usage scenarios andalternate embodiments. Many other usage scenarios and embodiments arepossible.

2.1 Gaming Applications

The surface reconstruction contour completion technique embodimentsdescribed herein can be used to augment scenes in various gamingapplications. For example, a room can be scanned with a depth sensor anda 3D partial reconstruction volume can be created of the room. This canthen be used to augment the 3D partial reconstruction volume to showobjects and items partially occluded. Graphically generated charactersand gaming elements can also be overlayed.

2.2 Interior Design/Construction

The surface reconstruction contour completion technique embodimentsdescribed herein can be used to augment scenes in various interiordesign and construction applications. An augmented model of a capturedroom can be used to show all aspects of the room in order to allowconsumers to visualize how their furniture will look when installed in aroom.

2.3 3D Printing

In 3D printing a 3D model is sent to the 3D printer in order for theprinter to create a physical object. The surface reconstruction contourcompletion technique embodiments described herein can be used to createsuch a model.

2.4 Robotic Applications

In some scenarios indoor robots are used to traverse a room, forexample, to find a person or to find an explosive device. The surfacereconstruction contour completion technique embodiments described hereincan be used to create such a model of a room in order to guide a robotthrough previously unmapped spaces.

2.5 Object Cutout

In some scenarios a user may want to cut an object out of the existinggeometry of a modeled scene or reconstruction volume. The surfacereconstruction contour completion technique embodiments described hereincan be used to replace object cutouts to complete the reconstructionvolume.

3.0 Exemplary Operating Environment:

The surface reconstruction contour completion technique embodimentsdescribed herein are operational within numerous types of generalpurpose or special purpose computing system environments orconfigurations. FIG. 5 illustrates a simplified example of ageneral-purpose computer system on which various embodiments andelements of the surface reconstruction contour completion technique, asdescribed herein, may be implemented. It is noted that any boxes thatare represented by broken or dashed lines in the simplified computingdevice 500 shown in FIG. 5 represent alternate embodiments of thesimplified computing device. As described below, any or all of thesealternate embodiments may be used in combination with other alternateembodiments that are described throughout this document. The simplifiedcomputing device 500 is typically found in devices having at least someminimum computational capability such as personal computers (PCs),server computers, handheld computing devices, laptop or mobilecomputers, communications devices such as cell phones and personaldigital assistants (PDAs), multiprocessor systems, microprocessor-basedsystems, set top boxes, programmable consumer electronics, network PCs,minicomputers, mainframe computers, and audio or video media players.

To allow a device to implement the surface reconstruction contourcompletion technique embodiments described herein, the device shouldhave a sufficient computational capability and system memory to enablebasic computational operations. In particular, the computationalcapability of the simplified computing device 500 shown in FIG. 5 isgenerally illustrated by one or more processing unit(s) 510, and mayalso include one or more graphics processing units (GPUs) 515, either orboth in communication with system memory 520. Note that that theprocessing unit(s) 510 of the simplified computing device 500 may bespecialized microprocessors (such as a digital signal processor (DSP), avery long instruction word (VLIW) processor, a field-programmable gatearray (FPGA), or other micro-controller) or can be conventional centralprocessing units (CPUs) having one or more processing cores.

In addition, the simplified computing device 500 shown in FIG. 5 mayalso include other components such as a communications interface 530.The simplified computing device 500 may also include one or moreconventional computer input devices 540 (e.g., pointing devices,keyboards, audio (e.g., voice) input devices, video input devices,haptic input devices, gesture recognition devices, devices for receivingwired or wireless data transmissions, and the like). The simplifiedcomputing device 500 may also include other optional components such asone or more conventional computer output devices 550 (e.g., displaydevice(s) 555, audio output devices, video output devices, devices fortransmitting wired or wireless data transmissions, and the like). Notethat typical communications interfaces 530, input devices 540, outputdevices 550, and storage devices 560 for general-purpose computers arewell known to those skilled in the art, and will not be described indetail herein.

The simplified computing device 500 shown in FIG. 5 may also include avariety of computer-readable media. Computer-readable media can be anyavailable media that can be accessed by the computer 500 via storagedevices 560, and can include both volatile and nonvolatile media that iseither removable 570 and/or non-removable 580, for storage ofinformation such as computer-readable or computer-executableinstructions, data structures, program modules, or other data.Computer-readable media includes computer storage media andcommunication media. Computer storage media refers to tangiblecomputer-readable or machine-readable media or storage devices such asdigital versatile disks (DVDs), compact discs (CDs), floppy disks, tapedrives, hard drives, optical drives, solid state memory devices, randomaccess memory (RAM), read-only memory (ROM), electrically erasableprogrammable read-only memory (EEPROM), flash memory or other memorytechnology, magnetic cassettes, magnetic tapes, magnetic disk storage,or other magnetic storage devices.

Retention of information such as computer-readable orcomputer-executable instructions, data structures, program modules, andthe like, can also be accomplished by using any of a variety of theaforementioned communication media (as opposed to computer storagemedia) to encode one or more modulated data signals or carrier waves, orother transport mechanisms or communications protocols, and can includeany wired or wireless information delivery mechanism. Note that theterms “modulated data signal” or “carrier wave” generally refer to asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. For example,communication media can include wired media such as a wired network ordirect-wired connection carrying one or more modulated data signals, andwireless media such as acoustic, radio frequency (RF), infrared, laser,and other wireless media for transmitting and/or receiving one or moremodulated data signals or carrier waves.

Furthermore, software, programs, and/or computer program productsembodying some or all of the various surface reconstruction contourcompletion technique embodiments described herein, or portions thereof,may be stored, received, transmitted, or read from any desiredcombination of computer-readable or machine-readable media or storagedevices and communication media in the form of computer-executableinstructions or other data structures.

Finally, the surface reconstruction contour completion techniqueembodiments described herein may be further described in the generalcontext of computer-executable instructions, such as program modules,being executed by a computing device. Generally, program modules includeroutines, programs, objects, components, data structures, and the like,that perform particular tasks or implement particular abstract datatypes. The surface reconstruction contour completion techniqueembodiments may also be practiced in distributed computing environmentswhere tasks are performed by one or more remote processing devices, orwithin a cloud of one or more devices, that are linked through one ormore communications networks. In a distributed computing environment,program modules may be located in both local and remote computer storagemedia including media storage devices. Additionally, the aforementionedinstructions may be implemented, in part or in whole, as hardware logiccircuits, which may or may not include a processor.

4.0 Other Embodiments

It should also be noted that any or all of the aforementioned alternateembodiments described herein may be used in any combination desired toform additional hybrid embodiments. Although the subject matter has beendescribed in language specific to structural features and/ormethodological acts, it is to be understood that the subject matterdefined in the appended claims is not necessarily limited to thespecific features or acts described above. The specific features andacts described above are disclosed as example forms of implementing theclaims.

What is claimed is:
 1. A computer-implemented process for completing apartially reconstructed three dimensional scene (3D), comprising:receiving a dense partial reconstruction volume of a three dimensionalscene; detecting surfaces in the partial reconstruction volume;classifying the surfaces into different types of surfaces; determiningthe scene boundaries of the partial reconstruction volume of the sceneusing the classified surfaces; determining boundaries of partiallyoccluded surfaces of objects in the partial reconstruction volume usingthe classified surfaces; and updating the partial reconstruction volumeto show the determined boundaries of partially occluded surfaces ofobjects in the scene and the scene boundaries.
 2. Thecomputer-implemented process of claim 1 wherein the partialreconstruction is received from a surface reconstruction pipeline thatmaintains the partial reconstruction volume of the scene that is updatedwith new depth observations of the scene.
 3. The computer-implementedprocess of claim 1, further comprising using a contour completionprocess to update the contours of scene boundaries and the boundaries ofthe partially occluded surfaces of objects in the partial reconstructionvolume.
 4. The computer-implemented process of claim 3 wherein thecontours of surfaces in the scene and objects in the scene are completedby using lines and parabolas fitted to visible parts of the contours ofthe surfaces.
 5. The computer-implemented process of claim 4, furthercomprising fitting parabolas to visible parts of contours of surfacesby: projecting the visible parts of the contours of the surfaces onto atwo dimensional image; identifying contact points in the image whichfall at an intersection of free space, occupied and unknown pixels;finding contour points in the image that are on a contour of a planarsurface and bordering free space; and enclosing an occluded contour byfitting a parabola so that it either connects to a set of contact pointsor continues from a contact point until it reaches an end of the image.6. The computer-implemented process of claim 3, wherein the contourcompletion process minimizes an energy function that determines whethera point lies on a contour of a surface or not and labels the point asbeing on the contour of the surface or not; and wherein two neighboringpoints that are assigned different labels as to belonging to a contourof a surface or not are assigned a penalty when determining whether thepoints belong to the contour of the surface.
 7. The computer-implementedprocess of claim 1 wherein the dense partial reconstruction volume ofthe scene is expressed as a volume of voxels.
 8. Thecomputer-implemented process of claim 7 wherein every voxel is labeledas, occupied when having an observed surface at that voxel, free statewhen there is no observed surface at that voxel, or unknown.
 9. Thecomputer-implemented process of claim 1 wherein the dense partialreconstruction volume of the scene is expressed as a truncated signeddistance function grid of voxels with a surface normal.
 10. Thecomputer-implemented process of claim 1 wherein each surface isclassified into a semantic class.
 11. The computer-implemented processof claim 10 wherein each surface is classified into a semantic class offloor, wall, ceiling and internal, wherein surfaces classified as one offloor, wall and ceiling are used to infer scene boundaries, and whereininternal surfaces are extended and filled.
 12. The computer-implementedprocess of claim 10 wherein the planar surfaces are classified into thesemantic classes using a trained classifier which predicts eachsurface's class using ground truth labels and 3D features and whichcaptures attributes of each surface including its height, size andsurface normal distribution.
 13. The computer-implemented process ofclaim 13 wherein the classifier is a Random Forest classifier.
 14. Acomputer-implemented process for completing a dense partial 3Dreconstruction of a scene, comprising: detecting surfaces in the densepartial 3D reconstruction volume of a scene; classifying the detectedsurfaces into different types of surfaces; augmenting the dense partial3D reconstruction volume of the scene to show greater portions ofpartially occluded surfaces of objects in the scene and the sceneboundaries using the classified surfaces and a contour completionprocedure.
 15. The computer-implemented process of claim 14, furthercomprising continuously updating with new data the augmented 3Dreconstruction volume as the scene changes.
 16. The computer-implementedprocess of claim 14, further comprising phasing out augmented portionsof the augmented partial 3D reconstruction volume as the augmentedportions of the 3D reconstruction volume become known.
 17. A system forcompleting a partial 3D reconstruction of a scene, comprising: one ormore computing devices; and a computer program having program modulesexecutable by the one or more computing devices, the one or morecomputing devices being directed by the program modules of the computerprogram to, classify planar surfaces detected in a partial 3Dreconstruction of a scene; and update the partial 3D reconstruction toshow greater portions of partially occluded surfaces of objects in thescene and the scene boundaries by using the classified planar surfacesto complete visible contours of the planar surfaces.
 18. The system ofclaim 17 wherein the detected planar surfaces are updated in real time.19. The system of claim 18 wherein the 3D partial reconstruction of ascene is received via a network from an on-line surface reconstructionpipeline.
 20. The system of claim 17 wherein the visible contours of theplanar surfaces are completed by fitting one or more parabolas.